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Abstract 


A hierarchical latent regression model is suggested to estimate nested and nonnested relationships 
in complex samples such as found in the National Assessment of Educational Progress (NAEP). 
The proposed model aims at improving both parameters and variance estimates via a two-level 
hierarchical linear model. This model falls naturally within the set of models used in most large 
scale surveys, in that all of them are special cases of the hierarchical latent regression model. The 
model parameter estimates are obtained via the expectation-maximization (EM) algorithm. An 
example with NAEP data is presented and results of parameter estimation and standard errors 
are compared with results from operational procedures of NAEP. 

Key words: Hierarchical linear models, latent regression models, maximum likelihood estimates, 
EM algorithm, item response theory, NAEP 


1 



Acknowledgments 


The authors would like to thank Dr. Shelby Haberman and Dr. John Donoghue for their 
suggestions, reviews, and discussions with the authors for a variety of issues involved in this 
project. We thank Catherine McClellan, Tanias Antal, and Dan Eignor for their reviews and 
comments. 


n 



1 Introduction 


Hierarchical linear modeling (HLM) is a well-known framework particularly appropriate for 
complex stratified and clustered samples with balanced or unbalanced designs and varying degrees 
of missing data patterns (Raudenbush & Bryk, 2002). This family of models has proven to be 
useful especially for the analysis of educational data from complicated designs involving survey 
sampling. Examples of cognitive educational surveys with complex designs are the National 
Assessment of Educational Progress (NAEP) and the Trends in International Mathematics and 
Science Studies (TIMSS). In these programs students are sampled from schools, and schools are 
sampled from participating states or countries following a stratification scheme. Stratification 
in this context means that the sample is drawn from each of several relatively homogenous 
strata with respect to characteristics such as median household income and level of urbanization. 
Furthermore, blocks of items are systematically sampled across students to assess a large amount 
of information in a relatively short amount of student testing time. Hence, students do not 
answer all test questions, and relatively little information is obtained from individual students. 
Justification for this design stems from the reporting level of these programs, being proficiency 
estimates of student populations (e.g., male students, students in a particular state or country). 

The prevalent operational model used to estimate student population proficiencies is a latent 
regression model (Allen, Donoghue, & Schoeps, 2001; Mislevy, 1984, 1985). Population groups 
indicators are regressed onto a latent proficiency, which is characterized by a collection of item 
response theory (IRT) models (Birnbaum, 1968; Lord, 1980; van der Linden & Hambleton, 1997). 
Suppose that the ability of student i in subscale t is denoted as 6u, where i = 1, • • • ,n and 
t = 1, • • • , p. For example, in the NAEP mathematics assessment p equals 5, corresponding to the 
subscales: (a) Numerical Properties and Operations; (b) Measurement; (c) Geometry; (d) Data 
Analysis and Probability; and (e) Algebra. Furthermore, 8 is a latent variable and, hence, the 
latent regression assumes the following form: 

0 it = 7 t Xi + £i, ( 1 ) 

where *y t is a vector of Q regression coefficients for scale t, that is, j t = ( 71 *,--- ,7 Qt)' ■ 
Furthermore, Xi is a vector of Q population groups indicators and e% is a residual term, assumed 
to be normal distributed. Student latent proficiencies 6 t = {0 t 1 , • • • , 9in)' can be inferred from 
student item responses Y t = (y lt , • • • ,y^ t ) through item response theory. The complete latent 
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regression model assuming a normal distribution for 6j given /i t t = 'Jt x i can be estimated through 
the following likelihood: 

N / / M \ \ Wi 

Lt = n (( llP(yiht\0it,P h ) J ( 2 ) 

where P denotes the probability of observing response ytht given latent ability 9j and IRT item 
parameters (3. Furthermore, (j) denotes a normal density with variance of. The product operators 
imply that items are assumed to be independently observed and that students are independently 
observed. This is obviously not true in a complex sample, and therefore post-hoc operations are 
conducted in NAEP, using jackknife repeated replications (JRR), to derive appropriate standard 
errors. Several alternatives have been proposed. 

Wilson and Adams (1995) and Adams, Wilson, and Wu (1997) have formulated a multilevel 
model to concurrently estimate latent regression and IRT parameters. Also, Raudenbush, Fotiu, 
and Cheong (1999) have applied a multilevel model to the plausible values produced in the NAEP 
program, which are imputations from the posterior ability distribution. Furthermore, Johnson 
(2002) and Johnson and Jenkins (2005) have developed a Bayesian framework for multilevel 
IRT with NAEP data that include a Markov chain Monte Carlo estimation procedure and also 
concurrent estimation of item and population parameters. Finally, Aitkin and Aitkin (2005) have 
used generalized mixed models to estimate four-level models with NAEP-type data. 

These alternatives have in common that they work quite well with very small samples and 
small regression models. However, larger data sets corresponding to typical NAEP samples 
become rapidly intractable to compute. The only exception is Raudenbush et al. (1999), although 
the extent to which the multilevel structure is captured in the plausible values is questionable. In 
this study, a less ambitious approach has been taken to estimate a random effects parameter across 
schools in the population model. When students are selected from the same school, observations 
of school and teacher characteristics are quite likely to be related, violating the assumption of 
independent observations that is required in NAEP’s latent regression models. More importantly, 
as Raudenbush and Bryk (2002) pointed out, students in the same school share values on many 
more variables, some of which are not observed, which means that the variables tend to disappear 
into the residual term of the linear model, causing correlated disturbances. 

There are several other clustering levels in educational surveys like NAEP, such as states or 
countries, classrooms, and primary sampling units (PSUs). These are ignored in this study for 
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several reasons. For classrooms, sampling often occurs across subjects (e.g., reading, mathematics) 
and across students. In practice, not enough data is available to support such detailed estimation. 
For states and PSUs, the situation is more complex. In state-to-state samples, PSUs coincide 
with schools and a separate model is applied to each state after a common measurement scale 
is determined. In other words, the state level of clustering is implicit. For national samples, 
geographic location is more likely to be an important clustering variable, although stratification 
may be applied at this level as well, hence reducing the effect of clustering. 

The first section of this paper will introduce the structure of a latent hierarchical linear model 
that is appropriate for NAEP data, followed by a discussion of model parameter estimation via the 
expectation-maximization (EM) algorithm. Also, estimation of standard errors is discussed and 
the model is extended to the multivariate case. The univariate case is illustrated by an application 
of a two-level model to NAEP 2004 age 17 long-term trend mathematics data, and the resulting 
parameter estimates are compared with parameters from the current operational NAEP approach. 
The paper concludes with a discussion of these results. 

2 Hierarchical Latent Regression Model 
2.1 Hierarchical Linear Models 

The primary focus of this study is hierarchical modeling with latent variables. To facilitate 
the discussion of this model, we will use a simple two-level latent regression model with predictors 
only on the first level. This model can be considered a special case of a two-level hierarchical 
latent regression model (HLRM), which may or may not be more appropriate for the problem at 
hand. Extension of this model to a general HLRM is also straightforward. Since the discussion 
of hierarchical models involves sampling clusters, the notation of clusters will be added to 
indicate hierarchical relations and a nested data structure. Hence, there are rij students nested 
within school j for j = 1. • • • , .7, and their proficiencies 6\jt to 0 nj jt are likely to be positively 
correlated, since they share a common curriculum and instructional experience. Furthermore, 
let Xij be the vector of Q population groups indicators with an additional cluster index, that is, 
Xij = {X VJ \, Xij 2 , • • • , X-ij q ) for student i in school j. Correspondingly, the regression effects for 
school j and subscale f is 7 = ( 7 ^ 1 , • • • , IjtQ)'- The item responses for student i in school j 
on subscale t is denoted by y %3t . A two-level model describes linear relations between the latent 
proficiency and the population groups indicators where regression coefficients are random effects 
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across school j. The Level 1 model is: 


@ijt — Xij'ljt + ~ijI • (3) 

The residual term E- t jt is assumed to be £jjt r\j N( 0, of). The Level 2 model is: 

Ijt = It + u jt • ( 4 ) 

The coefficients ~f t in the above model (4) without the second-level notation implies that the 
overall regression effects (a vector of Q regression effect parameters (i.e., 7 f = ( 7 ti, ■ ■ ■ , 7 tQ)')) are 
fixed, that is, they do not vary across sampling units. 

With the second-level model, predictors can be added to model the random regression 
effects. Furthermore, this model can be made quite flexible by estimating some regression effects 
with second-level predictors and some without and possibly by allowing a subset of regression 
parameters 7 j t to be considered random effects Uj t and others to be considered fixed effects. 

In this paper, a simple unconditional model on the second level (i.e., the model in (4) without 
predictors) is used primarily for demonstration purposes. Moreover, in the simple model, -y t is 
an overall estimated mean of regression effects across all clusters, taking into account school or 
cluster random effects. Hence, it is expected that the HLRM model will help in the estimation 
of relations between student abilities and the group indicators. The random effects parameter 
Ujt is assumed to be Uj t ~ IV(0, Tf) for j = 1, 2, • • • J and t = 1, • • • ,p. A rationale for treating 
cluster regression effects as random effects is that NAEP does not report individual school effects 
but rather group proficiency for the population of students. Because participating schools as well 
as their students are sampled from this population, there is primarily interest in the variance 
component due to the complex sample and not in specific school or student effects. 

Substituting the model in (4) in (3) gives the combined model 

Oijt — Xij'/t T XijUj t -f- Ejji . (5) 

The marginal variance of Oijt is x-^Txf + a 2 , where x tJ Txf is the variance component due to 
random effects Uj t , attributable to the variation across selected schools. Also, of depicts the 
variation among students within schools. In this model, variation is decomposed into a school- 
and a student-level component. 
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2.1.1 Multivariate Case 


In the multivariate case, student proficiency 9,j is a p-dimensional vector (i.e., 9jj E $t p 
and 9{j = ■ ■ ■ , 9j.jp) 1 )- Regression effects on p subscales can be represented as a matrix 

r = (7 i|7 2 | • • • |7 p ), where each column is a vector of regression effects for a particular subscale or 
dimension. The random school effects are represented as Uj = [uj\\uj 2 \ ■ ■ ■ \Uj p ), and the residuals 
are a column vector of p subscale residuals e VJ = (e^i, • • • , £ %rp )' ■ Subsequently, the combined 
multivariate model is a hierarchical linear regression among p-dimensions and can be written as 

9 ij — XijT' T XijUj T Sjj . (6) 

Alternatively, this model can be written by using matrix notation that reflects only the 
second-level and/or subscale indices. For example, in the univariate case, for subscale t = 1, • • • ,p, 
the proficiencies of all students within a second-level unit (e.g., in school j) can be expressed as 
9jt = ( 9\j t , • • • , dnjjtY, and for group indicators Xj = (x'i j, • • • , x' nj j)'. Furthermore, the residual 
terms are £jt = (sijt, * • ■ , £ nj jt)', leading to a model for school j 

9jt = Xj-ff + XjUjt + £jt■ (7) 

This notation is used frequently throughout this report. In addition, the model is also expressed as 

Q Q 

Oijt = E Xijq'fqt T / ^ XijqUqjf T £ijt (8) 

q= 1 9=1 

when discussing computational details. 

2.2 Estimation of HLRM Parameters 

In NAEP, the sample is stratified both at the school- and at the student-level, resulting in 
unequal sampling probabilities. Let Wij be the sampling weight for student i in school j. While 
there is much discussion about the use of sampling weights within the HLM framework (e.g., 
Asparouhov, 2005), they are included here in HLRM models in order to capture differences 
between sampling and population distributions regardless of the complex nature of the sample. In 
the following sections, the marginal likelihood estimates (MML) of the model parameters will be 
discussed as part of an EM algorithm. For subscale £=!,••• ,p, the log likelihood function L over 
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all individual students and test item responses is 


L = log {YlYlPiy^lxij-y^XijTtx' 

\j = 1 i =1 


.. I 2\Wij 

ij + a t ) 


j= 1 i =1 


EE Wijlog ( / P(y ijt \6)(/)(e\x ij 'y t ,Xi j T t x' ij + a?)d0) . 


In the maximization step, the parameters j t , of, and T t given the data ( Uj t ,y i j t ) for i = 1, 
j = 1, • • • , J and t = 1, • • • ,p can be estimated by (see also the appendix) 


v - EE 

\j =1 i=l 


W{jX ijXij 


J n 3 

EE WijX ij (Oijt 'Ujt&ij ) i 

j =1 i=1 


^2 _ Sj=l Yli=l + Ylj=l X)i=l w ij{0ijt x ijlt. x ij u jtY 
“ Eli EEi«>« 


-Tf — J '' jt> (12 

3 =1 

where 6ijt in (10) is the posterior mean of student proficiencies and <5f jt in (11) is the posterior 
variance. In the expectation step (E-step), these moments can be found following 

&ijt = f 9ijtp(0ijt\yijt)d0ijt, (13 


&ijt ~ J (@ijt @ijt) P ( Qijt\lJijt)ddijt j 

where the posterior density p(^r/i|^0 can be expressed following Bayes theorem as 

iy. _ P(yijt\0ijt)4 > (Qijt\ x ij1ti x ij r I't x ij~\~ o 't) 

P lJt f p(y ijt \0ij t )(f>(0i jt \xi j 'y t , xijTtx'ij + a^dOijt ’ 


Furthermore, is a covariance matrix of random school effects 


/ Till Til 2 ••• TtlQ 

rji _ Ti21 Ti22 ••• Ti2Q 


TiQl T t Q2 


where the diagonal elements Tt qq indicate the variance of random regression effects of r y q tj for 
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q = 1, • • • , Q and t = 1, ■ ■ ■ , p across all schools j = 1, • • ■ , J. T tqq i and the off-diagonal elements 
indicate the covariance between two random effects 7 q tj and 7 q nj for q, q' = 1, • • • , Q. 

Extension of the maximization step to the multivariate case is straightforward. The estimates 
of T, E, and T = Var(Uj) for t = 1, • • • ,p and j = 1, • • • , J can be estimated by 

-1 


It = 


J n 3 

EE 


WijX ijX Z j 


J n 3 

EE IV ijX ij (Oijt 'Ujt&ij ) ? 

j =1 i=1 


^ _ Z^=l Yli =1 w ij^ij + Sj=l YliLl w ij£ij£ij 


E J \~^ n j 

j =1 Di=1 


(16) 


^=1 *=i 

for each subscale t = 1, • • • ,p. Collecting the estimates of j t will yield T = [ 71 , • • • ,7 ]: 


(17) 


f = 


1=1 


(18) 


where ey/ = (Oij — XijT — XjUj) in (17). Furthermore, the posterior mean is now a vector of 
posterior means, and is the posterior variance-covariance matrix of Ojj. The covariance matrix 
T in (18) has dimension pQ x pQ. The diagonal block matrix in T is Ttt for t = 1, • • • ,p and the 
off-diagonal block matrix Tt s is the covariance matrix of school random effects between subscale 
t and subscale s, for s,t = 1, • • • ,p. Hence, the matrix T can be further written in terms of the 
variance and covariance of random school effects among p subscales as the following block matrix: 


T = 


( T “ T 12 

T21 T22 


V T 


pi p2 


Tip \ 

T2p 

T pp ) 


with 


rr\ J- \ / 

Tis = y u jtU j s , 


3 7 = 1 


(19) 


and 


Tt, = f 


St 


( 20 ) 


for t / s and t,s = 1, • • • ,p, denoting the covariance matrices of random school effects between 
subscale t and s. In the multivariate case, the covariance of the effects between any two subscales 
t and s for t 7 ^ s, t, s = 1, • • • ,p, is a Q x Q matrix with elements 
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/ Tt Sj ll Tts, 12 ••• Tts,lQ 

rfi _ Tts 21 Tits,22 ••• T~ts,2Q 

J- ts — 

\ T~ts,Q\ T~ts,Q2 ‘ ‘ ' T~ts,QQ 
where Ttt is defined in (18), representing the covariance matrix of school random effects for 
subscale t. Therefore, there are p variance matrices Tf and p ^ p f 1 ^ covariance matrices Tf S resulting 
in p y p matrices to be estimated in order to obtain T in the multivariate case. 

2.2.1 Conditional Moments for the Univariate Case 

Individual student proficiency 0 (i.e., Op , Opt) and the random school effect U (or up ) are 
unobserved quantities. However, conditional expectations can be obtained given observed response 
data Y (i.e., y lt , y t p) and parameter estimates from a previous EM cycle 7 t ,d 2 ,Tt- Specifically, 
student latent proficiencies Op and school effects Up are treated as missing data for j = 1, • • • , J 
and t = lp ■ ■ ,p. If ( Op , up) was observed, finding the MML estimates would be straightforward. 
For notational convenience, let including the observed responses yj and 

group indicators Xj, as well as previous parameter estimates from a previous cycle 7 ^., er^, T^. 

The posterior mean and variance for student abilities can be obtained through numerical 
integration using (13) and (14). Hence, to complete the E-step of this algorithm, an expression for 
the conditional expectations for E{up\£l) and E (Oij — x VJ ^ t — XijUp) 2 \£l to estimate a 2 , and 
E(upu'p\fl) to estimate Tf are needed. The question is how these conditional expectations can 
be found. 

For the term [Op — Xij^ t — XijUp) 2 in (11) it follows that 

(Oij Xjjp 1 XijUjt) — (Ojj Xjjpj ) - 2 (Ojj Xjjf )XjjUj 1 -f- (x ijUij ) . (21) 

Therefore, finding the conditional expectation for E[(0ij — Xp^f t — XijUjt) 2 \fl\ only requires the 
conditional expectations E(up\d)), E[(xijUij) 2 \Q], and E(ujtu 'Because 

E(upu'p\to) = E{up\n)E{u jt \ny + Var(up\fl), (22) 

the evaluation of E(ujtu'p\£l) relies on the conditional expectation E(ujt\fl)) and variance 
Var(ujt)fl) of Ujt. Therefore, in the E-step the conditional expectations and variances of missing 
data Ujt , as well as the conditional expectation of the quadratic term ( x^up ) 2 given f 2 , need 
to be evaluated before computing parameter estimates for each M cycle. That is, each E-step 
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requires the evaluation of E(6j t |fX), Var(6j t |fi), E{v,jt |f2), Var{ujt\ft) , and E[(xijUj t ) 2 \fl\. They 
are the elements necessary for the computation of expected sufficient statistics for 7 t ,a 2 ,Tt, 
provided that the complete data were observed. Substituting the expected complete data sufficient 
statistics into formulas in (10), (11), and (12) in the M-step yields improved estimates in terms 
of the likelihood. Repeating the E- and M-steps until convergence is achieved yields maximum 
likelihood estimates (Dempster, Laird, & Rubin, 1977). 

Let Gjt = ( Oijt , • • • , 0 inj t) and Var(6j t \fl) = Sj, denoting a diagonal matrix with elements 
Var(6ijt\Y), ■ ■ ■ , Var(6 nj j t \Y). The conditional mean E(uj t \fl), the conditional variance 
Var(ujt\fl) and the conditional expectation for the quadratic form E[(xijUjt) 2 |f2] depend on 
the joint distribution of Utj and latent abilities Ojt- The joint distribution of ( 6j,ujt) given 
X is assumed to be multivariate normal with mean vector (X y ' 7 t , 0 ) and covariance 


matrix 


Cov(6j t ,Ujt ) = 


a 2 ! + XjTtX'j X ,7 


T'tX'j 


^ t 

T t 


Following the proof of Raudenbush and Bryk (2002, pp. 442-443), the conditional expectation is 
given by 


E(ujt\9 jt ) = T t X'j{a 2 1 + XjT.X';) '[(),, - X jlt ) 

= C- t l X’^O jt -X jlt ), (23) 

where I in (23) indicates an identity matrix of size j for j = 1, • • • , J, and 

C jt = X'jXj + a 2 ^ 1 . (24) 


The conditional variance of Ujt given 6 j can be expressed as 


Var{u jt \0j) = T t - T t X',(<$1 + X/LX',) : X ; T, 

~ c jt a t ■ 


(25) 


Denote the conditional expectation E(ujt\6jt) as Ujt, which can be evaluated through (23) for 
each school. The values of Ujt and Var(ujt\6jt) can be obtained prior to observing item response 
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data Y. The conditional expectation of Ujt given Q can be expressed as 

u jt = J u^J p(u\e,Y)P(e\Y,al-r t ,T t )de^jdu 

= J J uP(u\9)P(9\Y , al lt ,T t )dud9 

= j Cj'X' j (9 jt -X j 'y t )P(9\Y,<Tl'y t ,T t )d0 
= C^X'j{9 jt -X jlt ). 

Also, the posterior mean Uj t in (26), E{^2, J - 1 Uj t u'j t ) can be expressed as 


( 26 ) 


E J ^2 Ujtu jt 

<3=i 


UjtUjt + Var ( u jt 
3 =1 3=1 


(27) 


where Var(v,jt\Xj, Y, a^, Tt) is the conditional variance for each random school effect. Write 
Var(ujt\Xj,Y,cr‘t,'y t ,T t ) as Var(uj t \fl) for simplicity, which can be integrated and expressed in 
terms of posterior variance of 9jt, denoted as Sp This is a diagonal matrix with elements equal 
to the posterior variance for each student within a school or cluster: 

Var(u jt \fl) = J\u jt - Ujt)(u jt - u jt )' (^ j P(u\9)P(9\fl)d9^j du 

— j jUjt)(uj t Ujt) + Ujt)(ujt Ujt) \P[u\9)P[9\£V)d'iid9 

= Var(ujt\9j) + J J [{u jt - Ujt)(ujt - Ujt)'}P{u\9)P(9\Cl)dud9 


= c jt 1(J t + 


C P X '3 


C it 1X > 


(28) 


To evaluate the conditional expectation E[(xijUjt) 2 |f2], first write it as a quadratic form, 
E[(xijUjt) 2 |f2] = E(u'jtx'ijXijUjt\El). By a theorem of the quadratic expectation (e.g., Stapleton, 
1995, p.51), 


E(u j / x ijXijUjt |1"2) — tv dec [x ijXjj Var( Ujt u jt•tt ijXijUjt 

— 't' ij E'it{U jt | ) x ij T u j / x jj Xjj Ujf . 


(29) 


Combining the expressions of ujt in (26), Var(ujt\Ei) in (28), and the quadratic term 
E[[xijUjt) 2 \Ei\ in (29) and substituting these into (21) yields E[(9ij t — Xij-y t — XijUjt) 2 \ft], which 
can be written as 


E\(9ijt Xij~f t XijUjt) |f2] — (@ijt •Eij^ft) 2{9ijt 'Eij1ft)XijUjt T 

Xij Var( ii’ji 1Id) x ij -f- u j^x ijXijUjf. 


(30) 
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The expressions of conditional expectation and variance for Ujt can be used to complete the 
computation of parameter estimates for each EM cycle. Note that the estimation of the regression 
parameters i t , the covariance matrix of the population proficiency distribution of, and the 
covariance matrix Tt of random effects in the M-step only depend on the evaluation of the 
conditional mean and variance of 0,jt in the E-step, because the conditional expectations E{u- t \Et) 
and E[(xijUjt) 2 \fl)\ and the conditional variance Var(ujt\Q) can be written as a function of the 
conditional moments of Ojt- Therefore, no additional numerical integration is needed to compute 
E(u- t \El) and Var(uj t \fl), and (15), (13), and (14) are used to obtain these quantities. In this 
case, a straightforward Simpson rule numerical quadrature integration is used to carry out the 
computations. 


2.2.2 Summary for Univariate Case 

The EM algorithm for maximum likelihood parameter estimation in HLRM models for 
the univariate case can be summarized as follows. The (r + l) th M-step can be completed by 
computing 

( J n i \ _1 J n i 

EE WijX'ijXij EE WijX ij{ 0 ijt UjfXjj ), (31) 

3=1 i= 1 / 3=1 i= 1 


a r +1 — 


Ylj=l Si=l + Xf=l Si=l w ijE[(0ijt 


X 


■jit - XijUjt) 2 |n r ] 


1 Yli =1 w ij 


(32) 


Tt — J '£ujtUjt+ j Var(ujt\El r ). (33) 

3=1 3=1 

The E-step can be completed by computing the posterior moments from (13) and (14) and, 
subsequently computing iijt using (26), E[{9ijt — x V} i t — XijUjt) 2 \fl r \ using (30) and Var(ujt\Cl r ) 
using (28). Furthermore, $T r is Q at iteration r. The E- and M-steps are alternated until 
convergence is achieved. 


2.2.3 Conditional Moments for the Multivariate Case 

It is straightforward to extend the results above to the multivariate case, in which 
the multivariate conditional expectation E[(6jj — XijT — X; fj — x tj T — XijU j)\fl\ 
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needs to be evaluated. Accordingly, U, is in this case (X, Y, T, X, T). Denote the product 
(Qij — XijT — x.jjUj)'(0ij — XijT — XijUj) as A. Then 

a — ( @/j (9 ij x';/T ) ( e. l3 r) XijUj u jX i-j ( Ojj Xij r) 

+U'jx' ijXijUj. ( 34 ) 

The conditional mean E[Uj\fl\ = Uj is simply the collection of school means for each subscale 
(i.e., Uj = [t*ji|w i2 | ‘ ‘ ‘ | Ujp]). The posterior variance Var(Uj\fl) can also be viewed as a collection 
of posterior variances and covariances of random school effects among subscales. That is, the 
posterior covariance matrix Var{Uj\fl) takes on the following form: 



( fu 

fia • 

■ f jp\ 

Var(Uj\Q) = 

t 2 , 

T22 • 

■ T 2p 


V T pl 

Tp2 ■ 

Tpp / 


The multivariate conditional expectation and subscale specific variance matrices of random 
effects can be obtained through (18). Hence, in the multivariate case P ^ P 2 ^ additional conditional 
covariance matrices T ts = cov[(uj t ,Uj s )\fl] between two subscales for t, s = 1, • • • ,p have to 
be computed, as well as the E ^{U'jX 1 ijX^Uj)\El\. Furthermore, A contains the p x p matrix 
U'jX'^x.ijUj. The diagonal elements can be found by (29) following a quadratic form, while the 
off-diagonal elements can be found by 

E\u ,js x ijXijUji |S~2] — cov\Uj s x jj , XjjUji |^] T Uj s x / / Xjj'u>jt, 

= Xij Cov[Ujt,Ujs\n]x'ij + u’j s x'ijXijUjt 
— XijTgt.x ij E U js x ijXijU jt, (35) 

where T s t is the posterior covariance between subscale s and subscale t, for s,t = 1, • • -p- 
In (29) and (35), two conditional covariances appear: cov[(xijUj tl XijUj t )\fl\ and 
cov[(xijUjt, XijUj s )\Q], The conditional variance ft of Ujt for subscale t given Q can be evaluated 
by (28), while the expression for cov[(xijUjt, XijUj s )\fl] will be more complicated to obtain. 
Primarily, T s t has to be computed. 

First, examine the joint distribution of 9jt, Oj s ,Ujt, and Uj s . The joint distribution of student 
proficiencies within school j and between two subscales 6jt, 9j s is assumed normal with mean 
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vector (Xjj t ,Xj'y s ), and variance matrix, 


M = Cov(O jt , 0 js ) = 


a 2 ,1 - X/FfX) (T 2 st I + X jT ts X'i 


.j± feAj 


a 2 st I + XsT'tsX'i a 2 s I + X,T S X', 


^ jJ- ts^- j v s-l ~r s^*- j 

where T s t = Cov(ujt,Uj S ), the covariance matrix for random effects between the two subscales. 
Similarly, the joint distribution of school random effects between two subscales Ujtanduj s is 
assumed normal with mean vector ( 0 , 0 ) and covariance matrix 


G = Cov(ujt, Uj s ) = 


T t T ts 


T' 


ts 


Furthermore, let 'J' =cov[(6j t ,6j s ), (ujt,Uj s )] so that 


= 


X jT t X 3 T ts 
T'tsX'j XjT s 


because Cov(ujt, £ij s ) = 0 by definition, F is a covariance matrix of student proficiencies and 
random effects for school j. Also, the joint distribution of Ojt, 0j s , Ujt, Uj s is assumed normal with 
mean (Xj~y t , Xj 7 S , 0 ) and covariance matrix 


B = 


M \F 
V G 


Subsequently, the conditional variance of school effects is 


Var[(u jt ,u js )\(0 jt ,0 js )] = G-V'M^, 


(36) 


using the same theorem as in (23). Furthermore, it can be shown that 


Cov(ujt, Ujs) — E [ Cov{ujt , Ujs) | Ojt , 0js\ cov \E{ujt | Ojt , 0j S ) : E(uj S \0 jt, Ojs)] • 


(37) 


The covariances in (37), cov [E(uj t \0j t , 0j s ), E(uj s \6j t , 0j S )], can be evaluated by substituting (23) 
into the following expression 


Cov \E{uji|Ojti 0js), E(uj s | Ojt , 0js)\ — Cov \ujt , Uj S 


= Cov 


Cjt'X’jiOjt - X jlt ), G.jx'yj),, - x n 


= Cj^X'j [a ta ® I\ Xj 


r~ l 
^ is 


(38) 
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where at s are the covariance components of the matrix XI. Then, from (36),(37), and (38), the 
expectation of the conditional covariance E [Cov(ujt, Uj s )\6jt, 0j S ] can be expressed as 


r -i 
^js 

r - 1 

^ js 


E [Cov(uj t , Uj S )\0j t , 0j S ] = G - ^>'M 1 'F 
= Cov(ujt, Uj S ^ T Gj-f- X j \(Jis (J) -7] Xj 
= T st + Cj'X'j [ats ® I\ X j 

Finally, the conditional covariance Cov[(ujt,Uj s )\£l] is evaluated by 


(39) 


Tt s — Cov\(ujt 1 Uj s ) |f2] 

= Cov[(iijt , Ujg^Ojt, 6j s \ + C j t X j ® I~\ X 


C 


-i 

jt 


— Tst + Cj?X'j ( (Tts + Vij(t,s )) ® I_ Xj 


c 


-1 


(40) 


with the same method as used in (28). Notice that is a posterior covariance component in 

the posterior covariance matrix which is given by 


/ 



^4/(1,3) 



0-jj(2,l) 

2,2) 

^(2,3) • • 


V 

&ij(p, 1) 

%(P>2) 

^ij{p, 3) 

°ij(p,P) / 


Hence, to obtain the conditional expectations for T s t for the multivariate case, the result from (40) 
has to be substituted back into (35). 


2.2.\ Summary for Multivariate Case 


The steps of the EM algorithm for maximum likelihood parameter estimation in the 
multivariate case can be summarized as follows. The (r + l) th M-step can be completed by 
computing 


J n 3 \ 1 J flj 

Tt,r+1 iEE WfjX ijX{j EE 'UJijX ij(0ijt UjfX%j ) 5 

j=l i= 1 / j =1 i =1 


(41) 


* , r+l — 


Sj=l X)i=l w ij^ij + X)j=l X)i=1 
X)j=i X)t=i w ij 


(42) 


and 


tt,r+l 


J 


Ujt.rUjt r + j Var(tijt )r |ri r 


j=i 


(43) 


j=i 
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The E-step can be determined by computing the multivariate posterior moments and, subsequently, 
Uj = (uj i, • • • , Uj p ) through (26), i tJ = Oij — Xij'y — XijUj , E[i' ij £ij\Q, r \ through (30), and 
Var[ujt >r \^ir) through (28). Also, Ujt,r and are the conditional mean for Ujt and Q, 
respectively, at iteration step r. 

Equation (43) gives the estimates of p matrices Tt,r- i-i at step r + 1. The other p ^° 9 ^ 
covariance matrices of school effects between subscales t, s = 1, • • • ,p are 

'i't.s,r -\-1 — ~J ^ ' Ujt,r‘U'js,r ^ ' Cov j U'js,r) |^r] ) (44) 

3 = 1 i =1 

using (40) to obtain an analytic expression for C'ou[(ii^ )r ,tij S)r )|fi r ]. The E- and M-steps are 
alternated until convergence is obtained. In sum, the general structure of the multivariate case is 
similar to the univariate case, except for the additional computation of the posterior covariances 
T s t of school effects. 


3 Standard Errors 

As briefly mentioned in the introduction, programs such as NAEP often make simple random 
sample assumptions during parameter estimation and then apply a post-hoc complex sample 
estimator to obtain appropriate standard errors. Hence, initial standard errors of parameters are 
computed based on simple random sample theory. There are some concerns that deserve further 
study to improve the estimation of standard errors for regression effects in the current NAEP 
analysis. Besides ignoring the complex sample structure, affecting both parameter estimates 
and their standard errors, an approximation is also employed that is governed by assumptions 
that may not satisfy the complicated NAEP context (e.g., a normal posterior distribution of 
6ijt). Before the standard error computation under the HLRM framework is discussed, a brief 
introduction of the current NAEP method is first provided below. The discussion will be limited 
to the standard errors of - y. 

3.1 Standard Errors in NAEP 

The standard errors of the regression effects are estimated by summing (a) the sampling 
variance and (b) a variance component that reflects the uncertainty due to the latency of 
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proficiency. Specifically, 


Var{ %) « Vartf t \X,Y t ) 

= E[Var(%\X,Y t ,e t )} + Var[E( lt \X ,Y t ,0 t )}. (45) 

The first component is estimated assuming that examinees are selected from a simple random 
sample and examinee proficiency values are observed. For the univariate case, noting that the 
conditional covariance equals of = E[ef] = E[(0u — 7 t Xi) ], this component is expressed as 
(X' DX)~ 1 a^, where D is a diagonal matrix of individual sampling weights. Note that On is the 
posterior expectation. The second component, taking into account that 9 t is not observed, is 
expressed as 

Var(E(j t \X, Y, 0 t )) = (X , DX)~ 1 X , D Var(O t \X, Y)DX(X’DX )~\ (46) 

because E(j t \X,Y,0 t ) = (. X'DX)~ 1 X'D0 t . Hence, the standard errors of regression effects in 
the univariate case can be approximated by 

Var( 7 t ) = ( XDX )“V t 2 + {X' DX)~ l X' DtDX(X’ DX )~ l , (47) 

where X = Var(9t\X, Y) is a diagonal matrix with student posterior variances. 

For the p -variate case, the covariances between effects across subscales also need to be 
estimated. The vector of student proficiencies 6, = (On, ■ ■ ■ , 0 ip )' for i = 1, • • ■ , N is assumed to 
have a common conditional variance matrix X. The variation due to sampling, Cov( 7 s ,7t), can 
be expressed as 

Cov( 7 s ,7t) = E [( 7 a - 7 s )( 7 t - 7 t )'] , (48) 

and the first component in (45) is computed as 

Cov( 7 s , 7 t ) = a st (X'DX)-\ (49) 


where cr s t is the ( s,t ) entry in the covariance matrix 



/ CTll 

P 12 

(713 ' ' 

■ (Ti p \ 

X = 

0 " 21 

0 " 22 

CJ 23 ' ' 

■ &2p 


V CT P 1 


&p3 

a PP ) 


The second component is similar to the univariate case, except that X is a collection of 
diagonal matrices. 


16 



3.2 Standard Errors for the HLRM 

The procedures employed in NAEP can be extended to the hierarchical latent regression 
model based on (45). The sampling variance (e.g., first component) of the estimate of -y t can be 
found following a development by Raudenbush and Bryk (2002, p. 42). Assuming Xj to be full 
column rank, the ordinary least square estimator of -y j t is 

7 jt = (X'jXjy'X'jdjt, (50) 

and the dispersion matrix is given by 

Var{%) = V jt = (X^Xj)- 1 **. (51) 

Multiplying both sides of the Level 1 model in (3) by (X' ■Xj)~ 1 X'j yields 

Ijt = Ijt + e jt, (52) 

where ejt r\j N(0,V jt ). Moreover, combining the Level 2 model in (4) yields 

Ijt = 7 + u jt + e jt . (53) 

The variance for in (53) is decomposed into two parts: one is the parameter dispersion of Ujt 
(e.g., Tt) and the other is the residual dispersion of ejt ( e.g., Vjt). Specifically, 

Var{ j jt ) = Var(u jt + e jt ) =T t + V jt , (54) 

is a Q x Q variance matrix that can be written as 

Var(j jt ) = A jt = T t + (X^Xj)- 1 ^. (55) 


In most educational surveys, school or cluster sample sizes are not balanced, and the values for 
A j t will differ from school to school. Assuming A is known, the unique, minimum variance, 
unbiased estimator of -y t will be the generalized least square estimator 

7t = (E A A) _1 E A A%f ( 56 ) 

3 =1 3 =1 

Subsequently, the variance of A f t is 

J 

Var(7 t ) = A 7/) _1 - ( 57 ) 

3 = 1 
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If A j t is not known, it can be estimated by (55) using T for T. 

The second component in (45) addresses the variation due to the latency of 6 and is equal to 
the variance of the posterior expectation of 7 t . The posterior expectation of qq is equivalent to 

E(lt\X,Y,e) = (58) 

3 =1 3 =1 

Subsequently, the variation of this expectation can be estimated by 


j 

Var(E(%\X,Y,d)) = 


{X'jXj 


r'x'jtjXjix'jXj] 


-1 


Ai 


V', 


(59) 


3 = 1 


ta V = [(£/,! AJ 1 )]--. 

It should be noted that the formula for computing E(Var(j t \X,Y t , 0 t )) in (57) and 
Var(E('Y t \X,Yt,0t)) in (59) are feasible if all Level 1 coefficients are considered random and 
every Level 2 unit contains an adequate sample to computate 7 ^ in (50). Raudenbush and Bryk 
(2002, pp. 44-45) stated that the formulae for computing standard errors for qq have rather 
limited use in practical applications. Following their derivations (pp. 44-45), the generalized least 
squares estimator for fixed effects is 


J \ 1 J 

E x iVe] X 3 E (60) 

3 =1 / 3 =1 

where 

V 0J = Var(e 3 ) = XjTXj + a 2 I. (61) 



Hence, the variance covariance matrix of for fixed effects estimates qq is 

-l 


J 


Var(%)= ( E*^* 
<3 =1 


3 


(62) 


If Xj is full rank, the result in (57) is equivalent to the result from (62), as proven by Raudenbush 
and Bryk (2002, p. 44). 


4 Application to NAEP Data 

The NAEP 2004 age 17 long-term trend mathematics assessment data has been analyzed 
with a two-level simple HLRM model without predictors on the second level. The data contains 
7,561 students, and a single scale is assumed under the IRT model employed here. There are 62 
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cluster units, with sample sizes for each unit ranging from 3 students to 291 students. There are 7 
clusters where the number of students is less than 30, and 5 clusters where it is less than 20. The 
regression parameter estimates, as stated before, are overall mean effects across clusters. Thus, 
the estimates of regression effects parameters are expected to be close to the estimates from the 
current NAEP approach. 

A large model and three small models are tested. The large model contains 156 independent 
predictors plus an intercept at the first-level model. The predictors are represented as principal 
components x* with standard deviations sd *, where sd\ > sd > • • • > sd* > • • • > sdq. These 
principal components are extracted from dummy codes indicating membership to many student 
groups. The three small models contain only student group membership indicators or contrasts 
for (a) gender (male vs. female) and (b) racial ethnicity (White, Black, Hispanic, Asian/others), 
and (c) gender + race/ethnicity, in addition to the intercept included in each model. 

An EM algorithm for parameter estimation of a simple HLRM model is implemented via a 
C++ program. 

4-1 Small Models 

Small Model 1: Gender. A small model was estimated, using students’ gender designations as 
predictors at the first level, modeling male versus female students. The residual variance estimate 
in this small model is .9145, obliviously greater than that of the large operational model containing 
157 principal components, which is .3737. The larger residual variance of this small model implies 
that a large amount of the variation of latent traits is not accounted by this indicator. The 
residual variance estimate under the current NAEP approach is .978, which is slightly larger than 
that from the HLRM approach, implying that the variation accounted for by the clusters is small 
with respect to the male/female distinction. 

The regression effect and intercept estimates and standard errors, as well as the estimates 
from the current NAEP approach, are given in Table 1. The intercept estimate is .075, and 
regression effect for female students is -.1104. The estimates of regression effects are very close 
to those from current NAEP approach, with slight differences appearing in the fourth decimal 
place. Both regression effect estimates (HLM and NAEP) for female are negative, implying male 
students generally perform better than female students. 

Columns 4 and 5 are standard errors estimates for the regression parameters, denoted by 
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Table 1 

HLM Parameter Estimates Versus NAEP (Gender) 


Variable 

NAEP ( 7 ) 

HLRM( 7 ) 

HLRM SE( 7 ) 

HLRM SEi( 7 ) 

Intercept 

.075 

.0752 

.0498 

.0496 

Female 

-.1104 

-.1105 

.0232 

.0226 


HLRMSE ( 7 ) and HLRMSE\ ( 7 ), respectively. HLRMSE\{ 7 ) involves sampling variation 
only, but HLRM SE( 7 ) incorporates both sampling and measurement variation. It shows from 
the table that almost all the variation for estimating the regression parameters is attributed to 
sampling; only a small amount of variation is due to the latency of a student ability. 

Small Model 2: Race/ethnicity. A small model was also estimated using students’ racial 
group designations as predictors at the first level, modeling White students versus other races 
of students (Black, Hispanic, Asian/others). The residual variance estimate in this small model 
is .8312, obliviously greater than that of the large operational model containing 157 principal 
components, which is .3737. The larger residual variance of this small model implies that a large 
amount of the variation of latent traits is not accounted for by indicators of race/ethnicity. The 
residual variance estimate under the current NAEP approach is .8674, which is slightly larger than 
that from the HLRM approach, implying that the variation accounted for by the clusters is small 
with respect to the racial distinctions. 

The regression effect, intercept estimates, and standard errors, as well as the estimates from 
the current NAEP approach, are given in Table 2. The intercept estimate is .208 and regression 
effect for Black students is -. 8688 ; for Hispanic students, -.6306; and for Asian/other students, 
.0691. The estimates of regression effects are very close to those from the current NAEP approach, 
with slight differences appearing in the fourth decimal place. Both regression effect estimates 
(HLRM and NAEP) for Black and Hispanic students are negative, implying White students 
generally perform better than Black and Hispanic students. 

Small Model 3: Gender + race/ethnicity. A small model was also estimated using both 
students’ gender and racial ethnicity designations as predictors at the first level, modeling male 
versus female and White versus other racial students (Black, Hispanic, Asian/others). The 
residual variance estimate in this small model is .8290, obliviously greater than that of the large 
operational model containing 157 principal components, which is .3737. The larger residual 
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Table 2 

HLM Parameter Estimates Versus NAEP (Racial) 


Variable 

NAEP (7) 

HLRM(7) 

HLRM SE(-y) 

HLRM SEi( 7) 

Intercept 

.208 

.2083 

.0358 

.0356 

Black 

-.868 

-.8688 

.0441 

.0433 

Hispanic 

-.630 

-.6306 

.0496 

.0488 

Asian/others 

.069 

.0691 

.0665 

.0652 


variance of this small model implies that a large amount of the variation of latent traits is not 
accounted for by indicators of racial ethnicity and gender. The residual variance estimate under 
the current NAEP approach is .8653, which is slightly larger than that from the HLRM approach, 
implying that the variation accounted for by the clusters is small with respect to the gender and 
racial distinctions. 

The regression effect, intercept estimates, and standard errors as well as the estimates from 
the current NAEP approach are given in Table 3. The intercept estimate is .2514. The regression 
effect for female students is -.0882; for Black students, -.8641; for Hispanic students, -.6279, and 
for Asian/other students, .0729. The estimates of regression effects are very close to those from 
the current NAEP approach, with slight differences appearing in the fourth decimal place. Both 
regression effect estimates (HLRM and NAEP) for female students are negative, implying male 
students generally perform better than female students. Both regression effects estimates (HLRM 
and NAEP) for Black and Hispanic students are negative, implying White students generally 
perform better than Black and Hispanic students. 


Table 3 

HLM Parameter Estimates Versus NAEP (Gender + Racial) 


Variable 

NAEP (7) 

HLRM(7) 

HLRM SE(7) 

HLRM SEi( 7) 

Intercept 

.2514 

.2516 

.03959 

.0393 

Female 

-.0882 

-.0882 

.0225 

.0219 

Black 

-.864 

-.8641 

.0432 

.0424 

Hispanic 

-.627 

-.6279 

.0491 

.0483 

Asian/others 

.0729 

.0729 

.0668 

.0656 
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J^.2 Large Model 

Model parameter estimates. A large model was estimated using 156 principal components 
extracted from the covariance matrix of the student group indicators employed in the current 
NAEP model. This model converged around the 100th EM cycle, with the total likelihood 
monotonically increasing until convergence, as is presented in Figure 1. The HLRM regression 
parameter estimates plus the intercept for the large model are given in Table 4, Table 5, Table 6, 
and Table 7 in column 5, denoted HLM{' y), along with current NAEP estimates of regression 
parameters in the second column, denoted as NAEP(~f). (Note that Table 4, Table 5, Table 6, 
and Table 7 actually constitute a long table, containing the results of parameter estimates with 
a large set of principle components extracted from operational NAEP analysis). The regression 
parameter estimates for these two approaches are very close to each other, which is expected, since 
the HLM regression effects are the overall mean estimates of the regression effects for each cluster. 
Figure 2 displays a plot of the regression effects estimates from HLRM versus those from NAEP. 


log likihood change over EM iterations 



0 50 100 150 200 

EM Iteration 


Figure 1 The log likelihood for the first 200 EM iterations. 

The estimate of the residual variance from the simple HLRM model is .3737, compared to 
.567 in the current NAEP model, which is expected because the HLRM takes the variation across 
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HLM Regression Coefficients Estimates Versus NAEP 



Figure 2 Comparison of HLM estimates for regression effects with NAEP. 

clusters into account. Not entirely expected is the formidable magnitude of this difference. The 
comparison of the residual variance estimates among the large model and the three small models 
is given in Table 8. 

Although the covariance matrix among random effects Tt has dimension 157, the computation 
of T t requires only the first two posterior moments of u. Therefore, the computations of T is 
stable as long as the computation of Ujt and Var(uj t \Cl) are stable. Because T t is a large matrix, 
the elements are not listed here. However, the diagonal components range from .000069 to .0142, 
which implies that there exist substantial differences between clusters, supporting the hierarchical 
model. It seems prudent to establish a mechanism to determine the significance of incorporating 
random effects in a given analysis. 

Standard errors. The standard error estimates from the simple two-level HLRM are expected 
to be higher than the current NAEP estimates because additional variation across clusters are 
accounted for. In Tables 4 through 7, column 3 denotes the NAEP standard error NAEPSE(' y), 
and column 6 denotes the HLRM standard error, HLMSE{fy). Furthermore, in Tables 4 through 
7, columns 4 and 7 show the standard error estimates for the part due to sampling only. From 
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Table 4 

HLRM Parameter Estimates Versus NAEP Estimates 


PCF 

NAEP (- y ) 

NAEPSE ( A ) 

NAEPSE ^ j ) 

HLM ( 7) 

HLMSEpt ) 

HLMSEAl ) 

1 

.0046 

.009 

.0087 

.0122 

.0219 

.0211 

2 

-.0374 

.001 

.0009 

-.0377 

.0022 

.0021 

3 

.0049 

.0013 

.0012 

.005 

.0032 

.0031 

4 

.0142 

.0016 

.0015 

.0137 

.0034 

.0033 

5 

-.0171 

.0018 

.0018 

-.0169 

.0035 

.0034 

6 

-.0461 

.0019 

.0018 

-.0474 

.0046 

.0045 

7 

-.0017 

.0019 

.0019 

-.0012 

.0024 

.0023 

8 

-.0235 

.0021 

.002 

-.0241 

.005 

.0049 

9 

-.001 

.0021 

.0021 

-.0009 

.0032 

.0031 

10 

.0013 

.0023 

.0022 

.0024 

.0039 

.0037 

11 

.0223 

.0023 

.0022 

.0226 

.0041 

.0039 

12 

.009 

.0023 

.0022 

.0085 

.0036 

.0035 

13 

-.0091 

.0025 

.0023 

-.007 

.0052 

.0049 

14 

.0149 

.0025 

.0024 

.0154 

.0042 

.0041 

15 

-.0071 

.0027 

.0026 

-.0076 

.0065 

.0064 

16 

.0158 

.0027 

.0026 

.0165 

.0034 

.0033 

17 

-.0011 

.0027 

.0026 

-.0005 

.0051 

.0049 

18 

-.0692 

.0028 

.0027 

-.0716 

.0044 

.0043 

19 

.0234 

.0028 

.0027 

.0236 

.0052 

.005 

20 

.0075 

.0029 

.0028 

.0077 

.0052 

.005 

21 

-.0207 

.0029 

.0028 

-.021 

.0062 

.006 

22 

-.0084 

.003 

.0029 

-.0088 

.0055 

.0053 

23 

-.0341 

.0031 

.003 

-.0353 

.006 

.0058 

24 

-.0122 

.0032 

.0031 

-.0113 

.0056 

.0054 

25 

.001 

.0033 

.0032 

.0007 

.0132 

.0129 

26 

.0032 

.0034 

.0033 

.003 

.0063 

.0061 

27 

.008 

.0035 

.0034 

.0085 

.0056 

.0054 

28 

.0093 

.0038 

.0036 

.0094 

.0078 

.0076 

29 

-.0324 

.0039 

.0037 

-.0334 

.0069 

.0066 

30 

.0085 

.0039 

.0038 

.0092 

.0067 

.0064 

31 

-.0241 

.004 

.0039 

-.0246 

.0075 

.0073 

32 

.0141 

.0042 

.004 

.0147 

.0074 

.0072 

33 

-.012 

.0043 

.0041 

-.0137 

.0081 

.0078 

34 

-.0285 

.0044 

.0042 

-.0281 

.009 

.0088 

35 

-.029 

.0044 

.0042 

-.0314 

.0076 

.0073 

36 

.0392 

.0044 

.0043 

.0409 

.0089 

.0085 

37 

.0083 

.0045 

.0043 

.0092 

.0093 

.009 

38 

.0043 

.0045 

.0044 

.0041 

.0081 

.0078 

39 

-.0233 

.0047 

.0045 

-.0235 

.0085 

.0082 

40 

.0416 

.0047 

.0045 

.0436 

.0081 

.0078 

41 

-.0378 

.0047 

.0046 

-.0383 

.0091 

.0088 

42 

.0164 

.0049 

.0047 

.0173 

.009 

.0086 

43 

.0181 

.0049 

.0048 

.0194 

.0087 

.0084 

44 

.0222 

.005 

.0048 

.0234 

.0086 

.0083 

45 

-.027 

.005 

.0048 

-.0278 

.0078 

.0075 

46 

.0155 

.0051 

.0049 

.0153 

.008 

.0077 

47 

.0171 

.0051 

.0049 

.0183 

.01 

.0096 
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Table 5 

HLRM Parameter Estimates Versus NAEP Estimates-Continued, 


PCF 

NAEP {7) 

NAEPSE(j) 

NAEPSE^j) 


HLMSE(j) 

HLMSEAl) 

48 

-.0367 

.0051 

.005 

-.0367 

.0091 

.0088 

49 

-.0059 

.0052 

.005 

-.0056 

.0091 

.0088 

50 

-.0015 

.0053 

.0051 

-.001 

.0082 

.0078 

51 

-.0276 

.0053 

.0052 

-.0286 

.0086 

.0083 

52 

-.0027 

.0054 

.0052 

-.0017 

.0116 

.0113 

53 

-.0043 

.0054 

.0052 

-.0044 

.0093 

.009 

54 

.02 

.0055 

.0053 

.0216 

.0094 

.0091 

55 

-.0113 

.0056 

.0054 

-.0115 

.0103 

.01 

56 

-.0106 

.0056 

.0054 

-.0109 

.0088 

.0085 

57 

-.0212 

.0057 

.0055 

-.0216 

.0096 

.0093 

58 

-.0069 

.0057 

.0055 

-.0074 

.0085 

.0082 

59 

.0088 

.0058 

.0056 

.0084 

.0076 

.0073 

60 

-.0015 

.0058 

.0056 

-.001 

.0103 

.0098 

61 

-.0175 

.0059 

.0057 

-.0184 

.0082 

.0079 

62 

.0214 

.0059 

.0057 

.023 

.01 

.0096 

63 

-.002 

.006 

.0058 

-.0022 

.0104 

.01 

64 

.0137 

.0061 

.0059 

.014 

.0126 

.0122 

65 

.0144 

.0062 

.006 

.0157 

.0089 

.0085 

66 

.0107 

.0062 

.006 

.0109 

.0123 

.0119 

67 

-.0048 

.0063 

.0061 

-.0034 

.0117 

.0113 

68 

-.0265 

.0064 

.0061 

-.0289 

.0113 

.0109 

69 

-.0022 

.0064 

.0062 

-.003 

.0126 

.0122 

70 

-.0144 

.0064 

.0062 

-.0164 

.0126 

.0123 

71 

.0072 

.0065 

.0063 

.0088 

.0107 

.0104 

72 

-.0003 

.0065 

.0063 

0 

.0094 

.009 

73 

-.0505 

.0066 

.0063 

-.0504 

.0122 

.0117 

74 

.0232 

.0066 

.0064 

.0229 

.0122 

.0118 

75 

-.0756 

.0067 

.0064 

-.0777 

.0094 

.009 

76 

-.032 

.0068 

.0065 

-.032 

.0116 

.0112 

77 

-.0246 

.0068 

.0066 

-.0259 

.0112 

.0107 

78 

.0023 

.0069 

.0066 

.0038 

.0148 

.0145 

79 

.001 

.0069 

.0067 

.0025 

.0108 

.0103 

80 

.0694 

.007 

.0068 

.072 

.0102 

.0097 

81 

.0073 

.0071 

.0068 

.0067 

.0126 

.0121 

82 

-.0445 

.0071 

.0068 

-.0435 

.0135 

.0131 

83 

.0064 

.0071 

.0069 

.0068 

.0138 

.0133 

84 

.0118 

.0072 

.0069 

.0115 

.0124 

.0119 

85 

.0215 

.0072 

.007 

.0209 

.0139 

.0134 

86 

.0292 

.0073 

.007 

.0312 

.0114 

.0109 

87 

.0207 

.0073 

.0071 

.0197 

.014 

.0136 

88 

-.031 

.0074 

.0071 

-.0321 

.0139 

.0135 

89 

.024 

.0075 

.0072 

.0225 

.011 

.0106 

90 

-.0477 

.0075 

.0072 

-.0478 

.0133 

.0129 

91 

.0016 

.0075 

.0072 

.001 

.0123 

.0119 

92 

-.0203 

.0075 

.0073 

-.0205 

.0116 

.0112 

93 

-.0128 

.0076 

.0073 

-.0117 

.0105 

.01 

94 

.0329 

.0076 

.0073 

.0339 

.0132 

.0127 
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Table 6 

HLRM Parameter Estimates Versus NAEP Estimates—Continued 


PCF 

NAEPpy) 

NAEPSEpy ) 

NAEPSEiiA) 

HLMp ) 

HLMSEpy) 

HLMSEipy) 

95 

.0216 

.0077 

.0075 

.0227 

.0135 

.013 

96 

.0324 

.0078 

.0075 

.033 

.0138 

.0134 

97 

-.0108 

.0078 

.0075 

-.0095 

.0129 

.0124 

98 

.0199 

.0078 

.0075 

.0209 

.0116 

.0112 

99 

-.0388 

.0078 

.0075 

-.0412 

.0113 

.0108 

100 

.029 

.0079 

.0076 

.0292 

.013 

.0125 

101 

.0241 

.008 

.0077 

.0245 

.0143 

.0138 

102 

-.0048 

.008 

.0077 

-.0056 

.0128 

.0124 

103 

-.014 

.0081 

.0078 

-.015 

.0136 

.0131 

104 

-.0031 

.0081 

.0078 

-.0039 

.0121 

.0116 

105 

-.004 

.0082 

.0079 

-.0044 

.0151 

.0147 

106 

-.0148 

.0082 

.0079 

-.014 

.0118 

.0113 

107 

-.023 

.0082 

.0079 

-.0225 

.0128 

.0124 

108 

-.0107 

.0083 

.0079 

-.0116 

.0123 

.0117 

109 

.0008 

.0083 

.008 

.0011 

.0133 

.0128 

110 

.0237 

.0083 

.008 

.0238 

.0148 

.0142 

111 

.0137 

.0083 

.008 

.0142 

.0149 

.0144 

112 

-.0101 

.0083 

.008 

-.0098 

.016 

.0156 

113 

-.0163 

.0084 

.0081 

-.018 

.0139 

.0134 

114 

-.0419 

.0084 

.0081 

-.0426 

.0141 

.0136 

115 

.0104 

.0084 

.0081 

.0125 

.0144 

.014 

116 

-.0303 

.0085 

.0082 

-.0316 

.0153 

.0149 

117 

.0227 

.0085 

.0082 

.0226 

.0151 

.0146 

118 

.0114 

.0086 

.0083 

.0139 

.0168 

.0163 

119 

-.0174 

.0086 

.0083 

-.0165 

.0142 

.0137 

120 

-.0335 

.0086 

.0083 

-.0341 

.0153 

.0147 

121 

-.0079 

.0087 

.0084 

-.009 

.0183 

.0177 

122 

-.0052 

.0087 

.0084 

-.0046 

.0164 

.0159 

123 

-.0297 

.0088 

.0084 

-.0306 

.0172 

.0167 

124 

-.0484 

.0088 

.0085 

-.0496 

.015 

.0144 

125 

-.0125 

.0088 

.0085 

-.0118 

.0182 

.0176 

126 

-.0356 

.0089 

.0086 

-.037 

.0138 

.0132 

127 

.0248 

.0089 

.0086 

.0266 

.0123 

.0117 

128 

-.0075 

.009 

.0086 

-.0067 

.0144 

.0139 

129 

.0199 

.009 

.0087 

.0205 

.0138 

.0132 

130 

.0045 

.0091 

.0087 

.0035 

.0147 

.0141 

131 

.0121 

.0091 

.0087 

.0139 

.0129 

.0123 

132 

-.0072 

.0091 

.0088 

-.0097 

.0186 

.018 

133 

.0352 

.0091 

.0088 

.0349 

.0147 

.0141 

134 

.0412 

.0092 

.0088 

.0416 

.0207 

.0202 

135 

.0129 

.0092 

.0089 

.011 

.0149 

.0145 

136 

.0049 

.0093 

.0089 

.0028 

.0159 

.0153 

137 

-.0124 

.0093 

.009 

-.012 

.017 

.0163 

138 

-.0187 

.0093 

.009 

-.0201 

.0202 

.0198 

139 

-.0333 

.0093 

.009 

-.0335 

.0152 

.0147 

140 

-.0086 

.0094 

.009 

-.0093 

.0155 

.0149 

141 

-.0095 

.0094 

.0091 

-.0112 

.0165 

.0159 
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Table 7 

HLRM Parameter Estimates Versus NAEP Estimates—Continued 


PCF 

NAEP {7) 

NAEPSEpi) 

NAEPSEAj) 

HLM {d) 

HLMSE( 7) 

HLMSEAl) 

142 

.0098 

.0095 

.0091 

.0088 

.0142 

.0136 

143 

-.0208 

.0095 

.0091 

-.0216 

.0149 

.0144 

144 

.0166 

.0095 

.0092 

.0172 

.0169 

.0164 

145 

.033 

.0096 

.0093 

.0319 

.017 

.0164 

146 

.0078 

.0096 

.0093 

.0076 

.0135 

.013 

147 

-.0007 

.0097 

.0093 

-.0001 

.0137 

.0132 

148 

-.0014 

.0097 

.0094 

-.003 

.0157 

.0151 

149 

-.0179 

.0098 

.0094 

-.0174 

.0174 

.0169 

150 

-.0217 

.0098 

.0095 

-.0218 

.0154 

.0149 

151 

-.0032 

.0099 

.0096 

-.0029 

.0155 

.015 

152 

.0074 

.0099 

.0096 

.0077 

.0132 

.0126 

153 

.0002 

.0101 

.0097 

.0004 

.0154 

.0149 

154 

.0098 

.0101 

.0097 

.0085 

.0147 

.0142 

155 

-.0184 

.0101 

.0097 

-.02 

.0157 

.0152 

156 

-.028 

.0101 

.0098 

-.0263 

.0177 

.0171 

157 

-.0155 

.0102 

.0098 

-.0143 

.0145 

.0138 


Table 8 

HLM Residual Variance Estimates d 2 Versus NAEP 


Model 

Large 

Gender 

Racial 

Gender + Racial 

HLM 

.3737 

.9146 

.8313 

.8290 

NAEP 

.5673 

978 

.8674 

.8653 


these columns it can be seen that the HLRM standard errors are substantially larger than standard 
errors from the current NAEP approach, falling a range between 2 and 10 times as large. Hence, 
as expected the current NAEP standard error estimates for regression effects are underestimates. 

As stated in Section 3, the variance of 7 contains both variation due to sampling and variation 
due to measurement errors for student abilities. For NAEP estimates for standard errors, the 
variation due to sampling is much larger than the variation due to the latency, where the former 
typically accounts for about 90% to 95% of the estimates. Similar pattern of results are found 
with the HLRM model standard error estimates, although the proportion due to sampling is 
higher. In Tables 4 through 7, the variation due to sampling is specifically listed as HLMSE\( 7 ) 
for the HLRM model and NAEPSE\{f() for the NAEP model. The proportion due to sampling 
in the HLRM model ranges from 94.73% to 97.83%, slightly higher than the proportions within 
the current NAEP approach, partly due to the variation across clusters being accounted for and 
added to the sampling variation. 
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5 Discussion and Conclusion 

In this paper a hierarchical latent regression model has been developed for use in large-scale 
assessments such as NAEP. The primary purpose of this model is to account for the hierarchical 
nature of the sample, hence, improving regression parameters and standard errors estimates. 

A simple two-level HLRM model discussed in this paper can be easily extended to more 
general two-level and/or higher level hierarchical linear models incorporating IRT modeling for 
student latent abilities, as is the case for the current NAEP models. The HLRM is naturally 
adapted from the current NAEP model, as students are naturally nested within schools. 

The regression effect estimates from a simple two-level HLRM model can be compared 
directly with the current NAEP estimates. However, they are by design more appropriate, since 
school clustering is taken into account. Some indication of this is provided by a crude comparison 
of standard errors, which appear under the HLRM to be at least twice the size of the current 
NAEP estimates. Another indicator is the fact that the residual variance decreased in the simple 
HLRM model (e.g., the three small models and the large model discussed above) compared to the 
current NAEP estimates, which is expected, since the random effects term across clusters accounts 
for variation that is otherwise attributed to unexplained variation. 

However, the interest was not limited to parameter estimates, but also included the general 
feasibility of estimating these parameters. Under the proposed formulation of the HLRM, the T 
matrix estimate provides some indication of the virtues of employing a hierarchical model. First 
of all, in the applications presented in this study, the diagonal of this matrix was substantial, 
indicating a nontrivial random effect. Second, no specific problems were encountered during 
estimation. Specifically, the estimation of T involves only computing Ujt, and Var(ujt\fl), the 
posterior mean and variance of Ujt and can yield a numerically stable estimate as long as those 
moments can be computed in a preceding stage. 

With the same convergence criterion, the EM algorithm in the current NAEP procedure takes 
six iterations to reach convergence, while the algorithm for HLM estimation needs many more 
cycles to reach convergence (more than 50 in this example) for the large model discussed. Figure 
1 is a plot of likelihood changes over the first 200 EM iterations. It shows that the log likelihood 
function is monotonically increasing over the first 200 iterations, which implies that the likelihood 
increases monotonically cycle by cycle as well. One possible explanation for this slow convergence 
could be the insufficient number of students in some clusters. As is addressed before, we have 62 
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clusters in this example, but 5 cluster units have less than 20 students and 7 cluster units have 
less than 30 students. 


6 Future Research Directions 

The purpose of regression parameter estimates and standard error estimates is to provide 
reasonable estimates for NAEP target statistics for population characteristics (e.g., mean scale 
scores for subgroups, percentage above a certain level of performance achievement). How the 
parameter estimation in the simple HLRM model would affect NAEP reporting scale scores and 
their standard errors for each subgroup is of great interest and currently under study. 

An important assumption of the HLRM model is that the item parameters are assumed to be 
fixed or estimated without errors, which is obviously an unsatisfactory statement. Simultaneous 
estimation of IRT item parameters and HLRM regression parameters also constitutes work in 
progress. 
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Appendix 

In the multivariate case, regression effects T will become a matrix including the regression 
effects for each subscale (i.e., T is a Q xp matrix [ 7 J, • • • , I 7 ] with each subscale regression effects 
■y f having Q components for t = 1, • • • , p ). Let Xij be the collection (or a row) of background 
variables for student i in school j in the p-scale assessment, then the likelihood function L for N 
students’ responses to n items in the test given school random effects U j is the total marginal 
likelihood, and is expressed as 


J n o 

L = log 

j=l i =1 
J n.j 

EE Wijlog I’iyij r'ijx'ij + U'jx'ij, £)] 

3 =1 i= 1 
J n j 

3 = 1 * =i 


P(v ij \0)mr ij x' ii + U'x'i, V)dO 


' j ij’’ 


(63) 


m T'ijx'ij + C/'ccE,£) represents for the conditional multivariate normal density with mean 
vector T'ijX 1 ^ + t/'ccL and covariance matrix £ (i.e., 0\Uj ~ 

w ijX[■ + U'.jX L, £)). Denote the expectation of 6 by y g = Y' ijx\- + UjX L, then the density 
function is given by 


^( 0 |/* 0 >£) = 7 — l ,1 exp 

(27r) 2 |£| 2 


-±(e - - n e ) 


The partial derivative of log(t>(0 |/zg,£) with respect to T' is 

dlog(b(0\n 0 , £ 


Therefore, 


dr' 


= T,-\e-Y'x' ij )x' ij . 


dL 

dr 7 


j n i 

^3 

3 =1 »—1 
J ? b' 

3 =1 i=l 
J n 


r nVijWm /**, £) dlog<f>{0\to, S) .. 

'LL)a 1 __, , . _ _ . CLw 


P{Vij) 

P( VlJ |0)0(0|^,£) 


P(y 


dr' 

£ _i (0 — n e )xijdd 


ij) 


f P{ e \Vij V- Vo) x ijdO 

i =1 i=l ■ 


3=1 i= 

J n . 

^^wijX-He-r'x'j-Ujx' 

j= 1 i=l 


ij 


(64) 


(65) 
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( 66 ) 



Set (57) to 0, and then we can obtain the estimates of each subscale 7 t , t = 1, • • • ,p, as is 
given by 


J n 3 


J n i 


It = 

K j=i i =i 

To obtain the estimates of XI, it follows that 

J n 3 

EE 


y ^ y v w ij x ij x ij ) y ^ y ^ x ij & u jt x %j ) • 

j =1 i=l 


(67) 


dL 

3X 


f P(yij\0)</>(0\lJ>e>'Ee) dlog(/>(6\n e ,'Ee) 

'UJ'ij I / \ ^ _ City 


j =1 i=l 
J 


i’ll/. 


IJJ 


SX 


EE»« [de. 


3X 


( 68 ) 


i=i *=i 

In the multivariate case, Uj, like T, is a matrix formed by columns of school random effects 
for each subscale (i.e., Uj = [uij\,U 2 j\, ■ ■ ■ \,u p j\). Now it becomes more convenient to find 
the derivatives of log(f)(0\nQ, X) with respect to X, a symmetric matrix. Denote the matrix 
E = X _1 (0 — Hq)(0 — /Lt^yX -1 , then the partial derivative of log(f>(0\fi e , X) with respect to X can 
be expressed as 


dlog4>(G\ix d ,T,) _ l, ov ,_i , 1 ro „ j. 

— — 2 — diagT, ) + - [2 ^ — diag^.] 

= l -diag [X- 1 - E] - [X- 1 - E] . 

Now denote the matrix S as 

J n 3 J n 3 

S X X w v = X X Wi J / P ( 0 I y%j )( 0 ~ Ve){ e ~ Ve)' de - 

j=1 i=1 j =1 i=l ' 

Following the same procedure given by Mislevy (1984, p. 366), 

_ J Tin J m 

f)T 1 J 3 J o 

— = — -diag [X^(X - 5)X~ 1 ] ]T £ Wij - X^(X - 5)X^ £ £ w iy 

j =1 i= 1 j =1 i =1 


(69) 


(70) 


(71) 


Set the above expression to 0, and then X = S. Substituting X into 70 and further simplify the 
equation in 70 will yield 


J n 3 J n 3 

E XX^' = / P( y°\yij)( e ~ Po)( e “ Ve)' de 

j =1 i=l 7=1 i=l ^ 


J=i *=r 

J n 3 n 

XX Wi i / p ( 6 , lyii)( 0 -^ + ^- 7 t e)( 0 -^ + ^-/ x e ) /d0 

i=i *=1 J 
J J n 3 

'y ^ /* ] — l*>o)(0ij — Vo) • (12) 

jf = l 7=1 J = 1 7=1 
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Therefore, the MML estimates for XI is given by 


S = 


Zj=l X)i=l w ij^ij + XX=1 Zi=l w ij(Oij A 1 #/ 


ZU TZi™ 


13 


ZU ZZi Wijtjj + Zj=i ZZi - !%■ - - T'x'jj - U'jx'ij)' 

Zj =1 X)i=l w ij 


( 73 ) 
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